Search CORE

5 research outputs found

Vector-Processing for Mobile Devices: Benchmark and Analysis

Author: Das Reetuparna
Fujiki Daichi
Khadem Alireza
Mahlke Scott
Talati Nishil
Publication venue
Publication date: 05/09/2023
Field of study

Vector processing has become commonplace in today's CPU microarchitectures. Vector instructions improve performance and energy which is crucial for resource-constraint mobile devices. The research community currently lacks a comprehensive benchmark suite to study the benefits of vector processing for mobile devices. This paper presents Swan-an extensive vector processing benchmark suite for mobile applications. Swan consists of a diverse set of data-parallel workloads from four commonly used mobile applications: operating system, web browser, audio/video messaging application, and PDF rendering engine. Using Swan benchmark suite, we conduct a detailed analysis of the performance, power, and energy consumption of vectorized workloads, and show that: (a) Vectorized kernels increase the pressure on cache hierarchy due to the higher rate of memory requests. (b) Vector processing is more beneficial for workloads with lower precision operations and higher cache hit rates. (c) Limited Instruction-Level Parallelism and strided memory accesses to multi-dimensional data structures prevent vector processing benefits from scaling with more SIMD functional units and wider registers. (d) Despite lower computation throughput than domain-specific accelerators, such as GPU, vector processing outperforms these accelerators for kernels with lower operation counts. Finally, we show five common computation patterns in mobile data-parallel workloads that dominate the execution time.Comment: 2023 IEEE International Symposium on Workload Characterization (IISWC

arXiv.org e-Print Archive

Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design

Author: Ahmadi Agreen
Austin Todd
Behroozi Armand
Dreslinski Ronald
Kaszyk Kuba
Li Lu
Mahlke Scott
May Kyle
Morton John Magnus
Mudge Trevor
Nguyen Brandon
O'Boyle Michael F P
Sun Jiawen
Talati Nishil
Vasiladiotis Christos
Verma Tarunesh
Yang Yichen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/04/2021
Field of study

Edinburgh Research Explorer

Optimizing Emerging Graph Applications Using Hardware-Software Co-Design

Author: Talati Nishil Rakeshkumar
Publication venue
Publication date: 01/01/2022
Field of study

A graph is a ubiquitous data structure that models entities and their interactions through the collec- tions of nodes and edges. It is widely employed in several important application domains ranging from social media, navigation tools, search engines, physics simulations, and biology. Despite its prevalence, the performance of graph algorithms on commercial platforms is limited. This is mainly due to the irregular memory accesses and convoluted control flow instructions used in graph algorithms while accessing large volumes of graph data (with billions of nodes/edges). Therefore, there is a pressing need for optimizing the performance of graph workloads. In this thesis, I present a systematic optimization study of a variety of graph workloads run- ning on both static and dynamic graphs. At a high level, I first analyze the unique challenges and execution bottlenecks of the state-of-the-art graph software frameworks running on commercial hardware platforms. I then use the insights obtained from this analysis to propose design optimiza- tions catered to the unique workload characteristics of a diversity of graph workloads. Specifically, first, I propose Prodigy—a hardware-software co-design solution to improve the performance of traditional graph processing algorithms (e.g., PageRank and SSSP) on multi-core CPUs. Second, I present an in-depth study of random walk–based graph learning algorithms on temporal graphs (a type of dynamic graph). Specifically, this study delivers high-performance, open-source CPU and GPU implementations of important graph learning applications, conducts a detailed performance analysis, and makes recommendations for future optimizations. Third, I showcase NDMiner—a domain-specialized Near Data Processing (NDP) architecture that signif- icantly improves the performance of Graph Pattern Mining (GPM) workloads. Last, I present Mint—a novel hardware accelerator architecture and an accompanying programming model for efficiently mining motifs in temporal graphs.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/175618/1/talatin_1.pd

Deep Blue Documents at the University of Michigan

CONCEPT: A Co

Author: Ben Perach
Heonjae Ha
Nishil Talati
Ronny Ronen
Shahar Kvatinsky
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Reconfiguration on nanocrossbar using material implication

Author: Ameya Riswadkar
C K Ramesha
DB Strukov
DB Strukov
LO Chua
Nishil Talati
Pravin Mane
Q Xia
Ramesh Raghu
S Kvatinsky
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref